109 research outputs found

    Reliably Calibrated Isotonic Regression

    Get PDF
    Using classifiers for decision making requires well-calibrated probabilities for estimation of expected utility. Furthermore, knowledge of the reliability is needed to quantify uncertainty. Outputs of most classifiers can be calibrated, typically by using isotonic regression that bins classifier outputs together to form empirical probability estimates. However, especially for highly imbalanced problems it produces bins with few samples resulting in probability estimates with very large uncertainty. We provide a formal method for quantifying the reliability of calibration and extend isotonic regression to provide reliable calibration with guarantees for width of credible intervals of the probability estimates. We demonstrate the method in calibrating purchase probabilities in e-commerce and achieve significant reduction in uncertainty without compromising accuracy.Peer reviewe

    Exploring uplift modeling with high class imbalance

    Get PDF
    Uplift modeling refers to individual level causal inference. Existing research on the topic ignores one prevalent and important aspect: high class imbalance. For instance in online environments uplift modeling is used to optimally target ads and discounts, but very few users ever end up clicking an ad or buying. One common approach to deal with imbalance in classification is by undersampling the dataset. In this work, we show how undersampling can be extended to uplift modeling. We propose four undersampling methods for uplift modeling. We compare the proposed methods empirically and show when some methods have a tendency to break down. One key observation is that accounting for the imbalance is particularly important for uplift random forests, which explains the poor performance of the model in earlier works. Undersampling is also crucial for class-variable transformation based models.Peer reviewe

    Importance Sampled Stochastic Optimization for Variational Inference

    Get PDF
    Variational inference approximates the posterior distribution of a probabilistic model with a parameterized density by maximizing a lower bound for the model evidence. Modern solutions fit a flexible approximation with stochastic gradient descent, using Monte Carlo approximation for the gradients. This enables variational inference for arbitrary differentiable probabilistic models, and consequently makes variational inference feasible for probabilistic programming languages. In this work we develop more efficient inference algorithms for the task by considering importance sampling estimates for the gradients. We show how the gradient with respect to the approximation parameters can often be evaluated efficiently without needing to re-compute gradients of the model itself, and then proceed to derive practical algorithms that use importance sampled estimates to speed up computation. We present importance sampled stochastic gradient descent that outperforms standard stochastic gradient descent by a clear margin for a range of models, and provide a justifiable variant of stochastic average gradients for variational inference.Peer reviewe

    Few-to-few Cross-domain Object Matching

    Get PDF
    Cross-domain object matching refers to the task of inferring unknown alignment between objects in two data collections that do not have a shared data representation. In recent years several methods have been proposed for solving the special case that assumes each object is to be paired with exactly one object, resulting in a constrained optimization problem over permutations. A related problem formulation of cluster matching seeks to match a cluster of objects in one data set to a cluster of objects in the other set, which can be considered as many-to-many extension of cross-domain object matching and can be solved without explicit constraints. In this work we study the intermediate region between these two special cases, presenting a range of Bayesian inference algorithms that work alsofor few-to-few cross-domain object matching problems where constrained optimization is necessary but the optimization domain is broader than just permutations.Peer reviewe

    Learning to lemmatize in the word representation space

    Get PDF
    Lemmatization is often used with morphologically rich languages to address issues caused by morphological complexity, performed by grammar-based lemmatizers. We propose an alternative for this, in form of a tool that performs lemmatization in the space of word embeddings. Word embeddings as distributed representations natively encode some information about the relationship between base and inflected forms, and we show that it is possible to learn a transformation that approximately maps the embeddings of inflected forms to the embeddings of the corresponding lemmas. This facilitates an alternative processing pipeline that replaces traditional lemmatization with the lemmatizing transformation in downstream processing for any application. We demonstrate the method in the Finnish language, outperforming traditional lemmatizers in example task of document similarity comparison, but the approach is language independent and can be trained for new languages with mild requirements.Peer reviewe

    Modeling of mutual dependencies

    Get PDF
    Data analysis means applying computational models to analyzing large collections of data, such as video signals, text collections, or measurements of gene activities in human cells. Unsupervised or exploratory data analysis refers to a subtask of data analysis, in which the goal is to find novel knowledge based on only the data. A central challenge in unsupervised data analysis is separating relevant and irrelevant information from each other. In this thesis, novel solutions to focusing on more relevant findings are presented. Measurement noise is one source of irrelevant information. If we have several measurements of the same objects, the noise can be suppressed by averaging over the measurements. Simple averaging is, however, only possible when the measurements share a common representation. In this thesis, we show how irrelevant information can be suppressed or ignored also in cases where the measurements come from different kinds of sensors or sources, such as video and audio recordings of the same scene. For combining the measurements, we use mutual dependencies between them. Measures of dependency, such as mutual information, characterize commonalities between two sets of measurements. Two measurements can hence be combined to reduce irrelevant variation by finding new representations for the objects so that the representations are maximally dependent. The combination is optimal, given the assumption that what is in common between the measurements is more relevant than information specific to any one of the sources. Several practical models for the task are introduced. In particular, novel Bayesian generative models, including a Bayesian version of the classical method of canonical correlation analysis, are given. Bayesian modeling is especially justified approach to learning from small data sets. Hence, generative models can be used to extract dependencies in a more reliable manner in, for example, medical applications, where obtaining a large number of samples is difficult. Also, novel non-Bayesian models are presented: Dependent component analysis finds linear projections which capture more general dependencies than earlier methods. Mutual dependencies can also be used for supervising traditional unsupervised learning methods. The learning metrics principle describes how a new distance metric focusing on relevant information can be derived based on the dependency between the measurements and a supervising signal. In this thesis, the approximations and optimization methods required for using the learning metrics principle are improved

    Transfer-Learning Methods in Programming Course Outcome Prediction

    Get PDF
    The computing education research literature contains a wide variety of methods that can be used to identify students who are either at risk of failing their studies or who could benefit from additional challenges. Many of these are based on machine-learning models that learn to make predictions based on previously observed data. However, in educational contexts, differences between courses set huge challenges for the generalizability of these methods. For example, traditional machine-learning methods assume identical distribution in all data—in our terms, traditional machine-learning methods assume that all teaching contexts are alike. In practice, data collected from different courses can be very different as a variety of factors may change, including grading, materials, teaching approach, and the students. Transfer-learning methodologies have been created to address this challenge. They relax the strict assumption of identical distribution for training and test data. Some similarity between the contexts is still needed for efficient learning. In this work, we review the concept of transfer learning especially for the purpose of predicting the outcome of an introductory programming course and contrast the results with those from traditional machine-learning methods. The methods are evaluated using data collected in situ from two separate introductory programming courses. We empirically show that transfer-learning methods are able to improve the predictions, especially in cases with limited amount of training data, for example, when making early predictions for a new context. The difference in predictive power is, however, rather subtle, and traditional machine-learning models can be sufficiently accurate assuming the contexts are closely related and the features describing the student activity are carefully chosen to be insensitive to the fine differences.Peer reviewe
    • …
    corecore